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On fitting the Pareto-Levy distribution to stock market index 
data: selecting a suitable cutoff value. 
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Abstract. The so-called Pareto-Levy or power-law distribution has been successfully used as a model 
to describe probabilities associated to extreme variations of worldwide stock markets indexes data and 
it has the form Pr(X > x) ~ x~ a for 7 < x < 00. The selection of the threshold parameter 7 from 
empirical data and consequently, the determination of the exponent a, is often is done by using a simple 
graphical method based on a log-log scale, where a power-law probability plot shows a straight line with 
slope equal to the exponent of the power-law distribution. This procedure can be considered subjective, 
particularly with regard to the choice of the threshold or cutoff parameter 7. In this work is presented a 
more objective procedure, based on a statistical measure of discrepancy between the empirical and the 
Pareto-Levy distribution. The technique is illustrated for data sets from the New York Stock Exchange 
Index and the Mexican Stock Market Index (IPC). 

Key words. Econophysics, power-law, returns distribution, fit, empirical distribution function. 

PACS. 01.75.+m Science and Society - 02.50.-r Probability theory, stochastic processes and statistics 
- 02.50.Ng Distribution Theory and Monte Carlo studies - 89.65.Gh Economics; econophysics, financial 
markets, business and management - 89.90.+n Other areas of general interest to physicists 



1 Introduction 



The Power-law distribution is present in a great scope of 
physical (phase transitions, nonlinear dynamics and disor- 
dered systems) P]G], financial (stocks prices and indexes 
I variations, volumes, volatility decay distributions) |30 
5 6 7 8 9, 10 , and other kinds of social phenomena (the 
World Wide Web and Internet router links, sexual con- 
tact networks, growth of cities, reference networks in scien- 
tific journals, University entrance examinations, and traf- 
fic penalties distributions) |11U12II13II14II15II16II17| . All these 
systems share the property of complexity and are driven 
by collective mechanisms of which signature is the power- 
law distribution. 

Studying variations of financial data is important in order 
to understand the stochastic process that drives them and 
also for practical purposes related to investment and risk 
management. 

In the analysis of stock market indexes variations, many 
observables are used^SI- I n this work we have chosen the 
returns series. Briefly reviewing the definition of series, if 
in general, X(t) denotes the value of a particular index at 
time t, its return series is defined as S(t) = log X(t+At) — 
logX(i); that is, as the logarithmic changes in the values 



of the index for a certain interval of time At, which can 
be studied within a few seconds to a many days range. 

It has been reported in several empirical studies, that in 
order to describe the probabilities of extreme returns vari- 
ations, the Pareto-Levy distribution is an useful model to 
compute probabilities. However, when fitting the power- 
law to empirical data, the choice of the threshold or cut-off 
parameter of the tail distribution does not seem to follow 
an objective procedure. Usually, its fitted value is obtained 
by judging the degree of linearity in a log-log plot involv- 
ing the empirical and theoretical distributions, even more, 
recently, well founded studies have criticize the reliability 
of this geometrical method [TT?ll2n] . 

Independently of these studies just cited above, we pro- 
pose in this work, a formal procedure to improve the qual- 
ity of the log-log plot fit based on measures of discrepancy 
between the empirical and theoretical distribution func- 
tions. In order to illustrate our technique, we study the 
daily returns distribution of both, an emergent and a well 
developed stock markets: the Mexican stock market index 
IPC 1 and the American DJIA 2 . Figures ^ and [21 show 
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1 Indice de Precios y Cotizaciones, which means Prices and 
Quotations index 

2 Dow Jones Industrial Average Index 
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daily returns distributions respectively for these financial 
markets. 





30 — 










IPC 


Density 


BO — 
10 — 
— 













■0.1 0.0 S W 0.1 



Fig. 1. Density histogram for daily logarithmic differences of 
the Mexican IPC index, from April 19 1990 to September 17 
2004 
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Qn= [ [F n - F ] 2 iPdFo 



where F n denotes the empirical distribution function and 
ip is a weighting function; for example: 

— For 4*{ z ) = 1) Qn becomes Watson's W 2 statistic. 

- For ijj(z) = {F (z)[l -Foiz)]}" 1 Q n it is the well 
known Anderson-Darling A 2 statistic. 

Focusing in the former, if we denote by A 2 (7) the com- 
puted value of the A 2 statistic for a given value 7, the 
fitted value of 7 can be chosen as the value which mini- 
mizes A 2 (7) . In the next section the computational details 
will be described; however, for a more detailed treatment 
of statistics based on the empirical distribution function, 
the interested reader is referred to |21j . 
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3 Computing formulas 



Let S(n < S( 2 -\ < 



(2) 



< S 



(n( 7 )) 



denote the ordered val- 
s(t„( y )), where 



ues of the observed series s(t±), s(t 2 ), 
s(U) _ 7 for i = 1, . . . , 71(7) and 71(7) denotes the number 
of remaining observations in the sample which are greater 
than or equal a given admissible value of 7. Then the pro- 
cedure steps of our method, can be enumerated as follows: 

1. Estimate the shape parameter £1(7) by 



a (7) = 



1 " (7 ' ) / \ 

^S log l?J 



Density histogram for daily logarithmic differences of 
Jones index, from April 19 1990 to September 17 2004 



2. For i = l,..., 71(7), compute the quantities 

0(7) 



Hi) 



1 



7 



2 Choosing the value of the threshold 
parameter 

Let S(s) be an absolutely continuous random variable and 
let us assume that there exists a value 7, such that, for s > 
7, S follows a Pareto-Levy distribution with parameters 
a > and 7; that is: 
When s > 7, 



3. Compute the value of the Anderson-Darling statistic 
using 



»(7) 



A 2 ( 7 ) 



-»(?)- ^ E^- 1 ) [ log; 

l0g{l - Z(„_ <+ i)}] 



••(i) 



(1) 



If the value 7 was known, we could compute a measure of 
fit of the Pareto-levy model using, for example, quantities 
within the so-called quadratic statistics: 



Starting with, say, 71 = sm in the complete sample, a 
sequence 71, . . . , 7 r can be constructed, to a desired accu- 
racy, to produce the sequence of values ^4 2 (7i), . . . , A 2 (-f r ). 
A plot of 7 r versus A 2 (7,.) will be useful, as it will be 
shown in the next section, for finding the value of 7 which 
minimizes the value of A 2 . 



H.F. Coronel-Brizio and A.R. Hernandez-Montoya: On fitting the Pareto-Levy distribution 



3 



S — 




0J01S O.CGS 0J03S OjOJS 

gamma 

Fig. 3. Anderson-Darling A 2 statistic versus selected values 
of the threshold parameter 7, corresponding to the positive 
values of the series S(t), computed from the IPC index data. 
The minimum value, 0.16, is attained for 7 = 0.0395 

4 Data Analysis 

In order to illustrate the technique, two data sets were an- 
alyzed. For both data sets, the daily returns series S(t) was 
constructed. The first one, consists of daily values of the 
Mexican stock market index (IPC), covering from April 
19, 1990 to September 17, 2004. The S(t) series which we 
will denote here as Sipc(t), had 3608 values from which 
1877 positive, and 1723 negative values, were used for the 
analysis. The second data set consists of daily values of the 
Dow Jones index recorded from April 19, 1990 to Septem- 
ber 17, 2004. The S(t) series constructed from this data, 
denoted here by Snj(t), had 3633 values. Here the analy- 
sis was based on 1899 and 1723 positive and negative val- 
ues, respectively. Data bases for the IPC and DJIA were 
downloaded from [22 and respectively. 

Using the procedure described in the previous section, the 
Pareto distribution was fitted to the positive and negative 
tails of the distributions of Sipc(t) and Soj(t) varying the 
value of the parameter 7 over the ordered sample values. 
In each case, the Anderson-Darling statistic, was used as 
a goodness-of-fit criterion. It must be remarked that for 
the analysis of the negative tails, the values —S(t) where 
used. 

Figures [21 to show the plots of A 2 versus different 
values of the threshold parameter 7. Using this approach, 
we obtained the following results: 

For the IPC index data, the best possible fit for the largest 
values in the positive tail, is obtained for 70 = 0.0395, 
where the minimum value of A 2 is 0.16; based on the 64 
largest positive observations, with an estimated value of 
the shape parameter a = 3.822 For negative tail, the min- 
imum value of A 2 was found to be 0.50, for 7 = 0.036. 
The fitted value of a was 3.507, based on the 69 smallest 
observations. 

For the case of the DJ index, the best positive fit gives 
A 2 = 0.315 for 7 = 0.0173 with a = 3.333; for the negative 
tail the results showed that the smallest value of A 2 = 0.18 



Index 


Tail 


A 2 


7 


a 


n 


IPC 


Positive 


0.16 


0.0395 


3.82 


64 


IPC 


Negative 


0.50 


-0.0360 


3.51 


69 


DJ 


Positive 


0.32 


0.0173 


3.33 


158 


DJ 


Negative 


0.18 


-0.0191 


3.50 


124 



Table 1. Best left and right tail fits, for the two analyzed 
series. 



Index 


Tail 


A^-method 


Graphical method 


IPC 


Positive 


3.82 


3.00 


IPC 


Negative 


3.51 


2.82 


DJ 


Positive 


3.33 


2.85 


DJ 


Negative 


3.50 


2.80 



Table 2. Fitted values of the parameter a obtained by graph- 
ical and A 2 methods. 



is attained for 7 = 0.0191 with a = 3.495; the above 
results were obtained from the 158 largest and the 124 
smallest observations of Soj(t), respectively. 

Table d summai 'i ze these results for easy reference. 
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Fig. 4. Anderson-Darling A 2 statistic versus selected values of 
the threshold parameter 7, corresponding to the negative val- 
ues of the series S(t), computed from the IPC index data.The 
minimum value, 0.50, is attained for 7 = 0.036 



Table ^suggests that our method tends to produce larger 
fitted values for the parameter a; however, we can expect 
a much better fit using the parameter values produced by 
the A 2 -method for obvious reasons. As an illustration con- 
sider the fitted regression line of logP(s) on logs, where 
P(s) = 1 — F n (s), for the positive tail of the Dow Jones 
S(t) series shown in figured The fitted value of a = 2.85 
corresponds to the slope of the regression line and it differs 
from our estimate a = 3.33; figures 03 and show the em- 
pirical F n (dash), and the fitted cumulative distribution 
function F for each case, using 7 = 0.0173. As expected, 
our estimates produce a better fit. 
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Fig. 7. Linear fit for 
series using 7 = 0.0173 

values of the series S(t), computed from the Dow Jones index 
data.The minimum value, 0.32, is attained for 7 = 0.0173 



Fig. 5. Anderson-Darling A statistic versus selected values 
of the threshold parameter 7, corresponding to the positive 
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Fig. 6. Anderson- Darling A 2 statistic versus selected values 
of the threshold parameter 7, corresponding to the negative 
values of the series S(t), computed from the Dow Jones index 
data.The minimum value, 0.18, is attained for 7 = 0.0191 



Fig. 8. Empirical (dash) and fitted (solid) cumulative distri- 
bution functions for the positive tail of the Dow Jones S(t) 
series, using 7 = 0.0173 and a = 2.85 



5 Conclusions 

An objective technique for fitting the Power-Law distri- 
bution to extreme variations in stock market indexes was 
presented. The method is based on the use of Anderson- 
Darling Statistic A 2 as a measure of discrepancy between 
the empirical and the theoretical distribution functions, 
selecting as the fitted parameters, the values which min- 
imize such a measure. The technique was illustrated for 
the case of the Dow Jones industrial average index and 
for the Mexican prices and quotations index. The results 
showed that this method can be used with better results 
than the traditional graphical method in which the value 
of the cutoff parameter 7 is chosen subjectively. 
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Fig. 9. Empirical (dash) and fitted (solid) cumulative distri- 
bution functions for the positive tail of the Dow Jones S(t) 
series, using 7 = 0.0173 and a = 3.33 



H.F. Coronel-Brizio and A.R. Hernandez-Montoya: On fitting the Pareto-Levy distribution 



5 



References 

1. Goldenfeld N, Lectures on Phase Transitions and Critical 
Phenomena. Frontiers in Physics (1992). 

2. Frisch U. Turbulence: The Legacy of A. Kolmogorov. Cam- 
bridge University Press (1997). 

3. V. Plerou, P. Gopikrishnan, L. A. Amaral, M. Meyer, H. E. 
Stanley, Phys.Rev. E 60,6, (1999) 6519-6529. 

4. P. Gopikrishnan et al, Eur. Phys. J.B 3, (1998) 139-140. 

5. P. Gopikrishnan et al, Phys.Rev. E 60, (1999) 5305-5316. 

6. T. Lux, Applied Financial Economics 6, (1996) 463-475 

7. V. Plerou, Gopikrishnan, L. A. Amaral, H. E. Stanley, 
Quant. Finance 1 (2001) 262-269. 

8. J. Bouchaoud, Quantitative Finance 1 (2001) 105-112. 

9. P. Cizeau, Y. Liu, M. Meyer, C.-K. Peng, H. Eugene Stanley, 
Physica A 245 (1997) 441-445 

10. Y. Liu, P. Gopikrishnan, P. Cizeau, M. Meyer, C. Peng, 
and H. E. Stanley Phys.Rev. E 60,2, (1999) 1390-1400. 

11. R. Albert, H. Jeong, and L. Barabasi, Review of Modern 
Physics 74, 47-97 (2002). 

12. R. Albert, H. Jeong and A.L. Barabasi, ., Nature 401, 
130-131 (1999). 

13. F. Liljeros, C.R. Edling, L.A.N. Amaral, et al., Nature 
411, 907, (2001). 

14. 

15. S. Redner, Eur.Phys.Journ. B 4, 131 (1998). 

16. Hari M. Gupta, Jose R. Campanha and Fernando D. 
Prado. International Journal of Modern Physics C 11, 6 
1273-1279, (2000). 4 

17. Rafaella A. Nobrega, Cricia.C. Rodegheri and Renato C. 
Povoas. International Journal of Modern Physics C 11, 7 
1475-1479, (2000). 

18. Mantegna, R.N. and Stanley, H.E., An Introduction to 
Econophysics (Cambridge University Press, United King- 
dom, 2000). 

19. M.L. Goldstein, S.A. Morris and G.G. Yen, "Problems 
with fitting to the power-law distribution," European Jour- 
nal of Physics B (in press) . 

20. Rafael Weron, . International Journal of Modern Physics 
C 12, 2 209-223, (2001). 

21. D'Agostino R.B. and Stephens M.A. Goodness of fit Tech- 
niques Marcel Dekker, New York, (1986). 

22. IPC data base used in our analysis, is available at the 
Banco de Mexico website: http://www.banxico.org.mx 

23. www.yahoo.finance.com 



